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(57) Abstract 

An improved mediod is described for lossy compression 
of images, so that visual distortion is reduced for a given 
compressed bit-rate or, equivalently, a lower bit-rate is required 
for a given level of visual distortion. The image is decomposed 
using a Wavelet transform of other space-frequency transform. 
The frequency bands are partitioned into small blocks (e.g. 
blocks 32 by 32 samples). The blocks are independently 
quantized and coded using an embedded block coder, so that 
each block bit-stream contains a large number of finely spaced 
truncation points. A visual distortion measure is computed for 
each block at each truncation point, where the metric is sensitive 
to the masking properties of the Human Visual System; that is. 
quantization errors in regions where the relevant frequency band 
already contains substantial activity are assigned a smaller visual 
distortion than comparable quantization errors in regions where 
the frequency band exhibits little activity. 
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Method for Visual Optimisation of Embedded Block Codes to 
Exploit Visual Masking Phenomena 

Field of the Invention 

The present invention relates to the lossy compression of still images 
5 and, more particularly, to an improved method for assigning bits to different 
spatial and frequency portions of the compressed image so as to maximise 
perceived visual quality. 
Background of the Invention 

Conventional image compression systems, such as that represented by 
10 the well-known baseline JPEG standard, suffer from a number of problems of 
which the following three are notable. 

1) They are unable to exploit visual masking and other properties of the 
Human Visual System (HVS] which vary spatially with image content. 
This is because the quantization parameters used by these algorithms are 

15 constant over the extent of the image. As a result images are unable to be 
compressed as efficiently as might be expected if visual masking were 
taken into account. 

2) To achieve a target bit-rate or visual quality, the image must be 
compressed multiple times, while varying one or more of the quantization 

20 parameters in an iterative fashion. This is known as the rate-control 
problem and it enters into many practical image compression 
applications, including the compression of digital camera inaages and 
page compression to save memory within printers, scanners and other 
such peripheral devices. 

25 3) The target bit-rate and desired viewing resolution must be known prior to 
compression. By contrast, for many applications, a scalable bit-stream is 
highly desirable. A scalable bit-stream is one which may be partially 
transmitted or decompressed so as to reconstruct the image with lower 
quality or at a lower resolution, such that the quality of the reconstructed 

30 image is comparable to that which would have been achieved if the 
relevant bit-rate and resolution were known when the image was 
compressed. Obviously, this is a desirable property for compressed image 
databases, which must allow remote clients access to the image at the 
resolution and bit-rate (i.e. download time) of their choice. Scalability is 

35 also a key requirement for robust transmission of images over noisy 

channels. The simplest and most commonly understood example of a 
scalable bit-stream is a so-called "progressive" bit-stream. A progressive 
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bit-Stream has the property that it can be truncated to any length and the 
quality of the reconstructed image should be comparable to that which 
could have been achieved if the image had been compressed to the 
truncated bit-rate from the outset. Scalable image compression clearly 
5 represents one way of achieving non-iterative rate-control and so 
addresses the concerns of item 2) above. 

A nvimber of solutions have been proposed to each of these problems. 
The APIC image compression system (Hontsch and Karam, "APIC: Adaptive 
Perceptual Image Coding Based on Sub-band Decomposition with Locally 

10 Adaptive Perceptual Weighting," International Conference on Image 

Processing, vol 1, pp. 37-40, 1997) exploits visual masking in the Wavelet 
transform domain, through the use of an adaptive quantizer, which is driven 
by the causal neighbourhood of the sample being quantized, consisting of 
samples from the same sub-band. The approach has a number of drawbacks: 

15 it is inherently not scalable; iterative rate-conti^ol is required; and the 

masking effect must be estimated from a causal neighbourhood of the sample 
being quantized, in place of a symmetric neighbourhood which would model 
the HVS more accurately. On the other hand, a variety of solutions have 
been proposed to the second and third problems. Some of the more relevant 

20 examples are the SPIHT (A. Said and W. Pearlman, "A New, Fast and 

Efficient Image Codec based on Set Partitioning in Hierarchical Trees," IEEE 
Trans, on Circuits and Systems for Video Technology, vol. 6, no. 3, pp. 243- 
250, June 1996) and EBCOT (D. Taubman, "EBCOT: Embedded Block Coding 
with Optimised Truncation," ISO/IECJTC 1/SC 29/WGl N1020R, October 21, 

25 1998.) image compression methods. These both produce highly scalable bit- 
streams and directly address the rate-control problem; however, they focus 
on minimising Mean Squared Error (MSE) between the original and 
reconstructed images, rather than minimising visual distortion. Some 
attempts have been made to exploit properties of the HVS within the context 

30 of SPIHT and other scalable compression frameworks; however, these 
approaches focus on spatially uniform properties such as the Contrast 
Sensitivity Function (CSF), and are unable to adapt spatially to exploit the 
important phenomenon of visual masking. The compression system 
proposed by Mazzarri and Leonardi (A. Mazzarri and R. Leonardi, 

35 "Perceptual Embedded Image Coding using Wavelet Transforms," 

International Conference on Image Processing, vol. 1, pp. 586-589, 1995.) is an 
example of this approach. Also worth mentioning here is the method 
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proposed by Watson (AB Watson, "DCT Quantization Matrices Visually 
Optimized for Individual Images," Proceedings of the SPIE, vol. 1913, pp. 202- 
216, 1993.) for optimising quantization tables in the baseline JPEG image 
compression system. Although this method is restricted to space invariant 
5 quantization parameters and non-scalable compression, by virtue of its 
reliance on the baseline JPEG compression standard, it does take visual 
masking and other properties of the HVS into account in designing a global 
set of quantization parameters. The visual models used in the current 
invention are closely related to those used by Watson and those used in 

10 APIC. 

Embedded Block Coding 

Embedded block coding is a method of partitioning samples from the 
frequency bands of a space frequency representation of the image into a 
series of smaller blocks and coding the blocks such that the bit stream in 

15 each block can be truncated at a length selected to provide a particular 

distortion level. To achieve embedded block coding, the image is first 

decomposed into a set of distinct frequency bands using a Wavelet transform, 

Wavelet packet transform, Discrete Cosine Transform, or any number of 

other space-frequency transforms which will be familiar to those skilled in 

20 the art. The basic idea is to further partition the samples in each band into 
smaller blocks, which we will denote by the symbols, B^,B^,B^,,.. The 

particular band to which each of these blocks belongs is immaterial to the 

current discussion. The samples in each block are then coded 
independently, generating a progressive bit-stream for each block, B. , which 

25 can be truncated to any of a set of distinct lengths, 72/,/?:,...,7?f'' , prior to 

decoding. Efficient block coding engines, which are able to produce a finely 
gradated set of truncation points, , such that each truncated bit-stream 
represents an efficient coding of the small independent block of samples, B^ , 

have been introduced only recently as part of the EBCOT image compression 
30 system. A discussion of the techniques involved in generating such 

embedded block bit-streams is inappropriate and unnecessary here, since the 
present invention does not rely upon the specific mechanism used to code 
each block of samples, but only upon the existence of an efficient, fine 
embedding, for independently coded blocks of samples from each frequency 
35 band. 
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The motivation for considering embedded block coding is that each 
block may be independently truncated to any desired length in order to 
optimise the trade-off between the size of the overall compressed bit-stream 
representing the image and the distortion associated with the image which 
5 can be reconstmcted from this bit-stream. In the simplest incarnation of the 
idea, each block bit-stream is ti^mcated to one of the available lengths, , in 

whatever manner is deemed most appropriate, after which the truncated bit- 
streams are concatenated in some pre-determined order, including sufficient 
auxiliary information to identify the truncation point, n. , and length, R^^' , 

10 associated with each block. Evidently, this provides an elegant solution to 
the rate-control problem described above. In more sophisticated incarnations 
of the idea, the overall compressed bit-stream might be organised into 
successively higher quality "layers", where each layer contains incremental 
contributions from the embedded bit-stream of each block, such that layers 1 

15 through / together contain the initial R^^ bytes from code-block 4 > foi' each 
/ = 1,2,3,... The truncation points, , associated with each block and each 

layer may be independently selected, subject only to the requirement that 
/;/ ^nl"^ , which is not restrictive in practice. The EBCOT image compression 

system provides a mechanism for efficiently embedding a large number of 
20 layei's in a single compressed image bit-stream, thereby generating a highly 
scalable representation of the image. In addition to scalability, bit-streams 
generated in this way possess important properties, including the ability to 
decompress arbitrary portions of only those code-blocks which are required 
to reconstruct a limited spatial region within the image. This is identified as 
25 a "random access" property. 

Embedded block coding also has important consequences for keeping 
implementation memory reqvurements down. This is because the space- 
frequency transform itself generally has localised memory requirements and 
the block coding process is also highly localised. Even though all blocks in 
30 the image (or at least a large fraction of them) must generally be considered 
to optimally select the truncation points, /?' , for each block, 5,., in each 

layer, / , these decisions may be made after the code-blocks have been 
compressed so that the impact on implementation memory is limited to the 
compressed representation of each block via its embedded bit-stream, 
35 together with some summaiy information which might be used to assist in 
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determining good truncation points. Together, this information is generally 
significantly smaller than the original image. 
Rate-Distortion Optimisation 

In considering rate distortion optimisation we must consider methods 
5 for minimising overall image distortion, subject to a constraint on the overall 
bit-rate, and for minimising bit-rate subject to a constraint on the overall 
image distortion. The optimisation task is greatly simplified by considering 
only "additive" distortion measures, where the overall distortion, D , may be 
written as a sum of independent contributions, , from each of the code- 

10 blocks, 5,.. Under these conditions, let denote the contribution to the 
overall image distortion from code-block , when its embedded 
representation is truncated to length . The objective, then, is to find the 
set of truncation points, , which minimise 

i 

15 subject to R<, , where is the bit-rate constraint and 

i 

is the overall bit-rate. 

It is common to use Mean Squared Error (MSE) as the distortion 
measure, primarily because MSE satisfies the additivity property in equation 
20 1 reasonably well. Specifically, let w.j^ denote the basis function associated 

with sample of block in the space-frequency ti-ansform, so that the 
original image may be recovered as 22>»'/.jt * • Now define 

I k 

k 

where s^[k] denotes the distorted samples reconstructed from the bit-stream 
25 after truncating block fi/s embedded representation to length /?" , and |w.| 
denotes the L2-norm of the basis functions (All basis functions, w^j, , for block 
B^ are shifted versions of one another since the block's samples all belong to 
the same frequency band of the space-frequency transform. Consequently, 
they must all have the same L2-norm) associated with each of the samples in 
30 block B, . Then, setting D'; = , it is not hard to show that the additivity 

requirement of equation 1 is satisfied with D denoting MSE, provided either 
the basis functions, w.j^ , are all orthogonal to one another, or the individual 

sample distortions, s"[k]-sXk], are unco-related. In practical applications, 
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m ^ • . 

neither of these assumptions might be strictly true, but the basis functions 

are often approximately orthogonal. 

Now it is not hard to see that any set of truncation points, {^7,. ^} , which 

minimises 

i 

for some A , is optimal in the sense that the distortion cannot be reduced 
without also increasing the overall bit-rate. Thus, if a value of A can be 
found such that the truncation points which minimise equation 4 yield the 
target rate, R;^ = , exactly, then this set of truncation points must be an 

10 optimal solution to the rate-distortion optimisation problem. In general, 
however, it will not be possible to find a value of A for which i?^ = , 

since there are only finitely many code-blocks with a finite number of 
available truncation points. Nevertheless, if the code-blocks are relatively 
small and each block offers a finely embedded bit-stream, it is sufficient in 
15 practice to find the smallest value of A such that R^ < R^^ . Similarly, if one 

is interested in minimising the overall bit-rate subject to some constraint on 
the distortion, it is sufficient in practice to find the smallest value of A such 
that Z),<Z5_. 

It can be demonstrated that the determination of the trimcation points, 
20 nf , which minimise the expression in equation 4 may be performed very 

efficiently, based on a small amount of summaiy information collected 
during the generation of each code-block's embedded bit-stream. It is clear 
that this minimisation problem separates into the independent minimisation 
of D^'-^ + AR"'^^ , for each block, B. . An obvious algorithm for finding each 
25 truncation point, , is as follows: 

Initialize n,.^ = 0 (i.e. no information included for the block) 

For j = 1,2,3,... 

Set M/ = Rf - R"'-' and AD/ = D"'' - Rj 
If ADI/AR/ > X then set n, = j 

30 Since this algorithm might need to be executed for many different values of 
A , it makes sense to first identify the subset, , of candidate truncation 
points. Let 7, < 7-. <. be an enumeration of the elements of A^^ and let the 
rate-distortion "slopes" for each element be given by S\' = AD/* I bJ<l' , where 
A/?/* =i?/* -i?/*-* and AD/* =D/*-* -D/* . Evidently, the slopes must be strictly 

35 decreasing, for if *S^^**' > .S^^* then the truncation point, , could never be 
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selected by tnTODove algorithm, regardless of the value of X , and so iV. 

would not be the set of candidate truncation points. When restricted to the 
set, A^. , of truncation points whose slopes are strictly decreasing, the 



clear that strictly decreasing slope is a sufficient as well as a necessary 

condition for the set of candidate truncation points. 

In a typical implementation of these rate-distortion optimisation ideas 
within the context of embedded block coding, the set TV. is determined using 

a conventional convex hull analysis, immediately after the bit-stream for B- 

has been generated. The truncation lengths, 7^/* , and slopes, ^S/* , are stored 

in a compact form along with the embedded bit-stream until all code-blocks 
have been compressed, at which point the search for X and {^2,^} , which 

minimise distortion subject to a maximum bit-rate (or minimise bit-rate 
subject to a maximum distortion) proceeds. The search may be repeated for 
each bit-stream layer in the case of the more complex bit-stream 
organisations described previously. 
Summary of the Invention 

According to a first aspect the present invention consists in a method 
of compressing a digital image including the steps of: 

a) Decomposing the image into a set of distinct frequency bands using a 
space frequency transform; 

b) Partitioning the samples in each frequency band into code blocks; 

c) For each code-block, generating an embedded bit-stream to represent the 
contents of the respective code block; 

d) Determining a rate-distortion optimal set of truncation points, n\ for each 
code-block, 5,. , and each quality layer, / , of which there may be only one 
subject to a constraint on the overall bit-rate or distortion for the layer in a 
manner which is sensitive to the masking property of the Human Visual 
System (HVS); and 

e) Storing the embedded bit-streams for each code-block. 

Preferably, in the compression method, the code-block truncation 
points are selected according to a rate-distortion optimisation criterion, using 
a distortion measure which is sensitive to masking in the HVS. 

Preferably also, contributions to the distortion measure from each 
sample in a code block are weighted as a function of a neighbourhood of 



algorithm reduces to the trivial selection, /7,.^ = max 



IS 
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samples surrounding the respective sample. In the preferred embodiment the 
distortion measure is a weighted sum of the squared errors taken at each 
sample, and the weighting function is a function of the magnitudes of the 
samples in the respective neighbourhood of samples. To ease the 
5 computational burden, the weighting function may be held constant over a 
sub-block of samples, which is preferably selected to have dimensions no 
larger than the full size of the respective code block. The samples that are 
averaged by the weighting function are preferably taken only from within the 
sub-block, however in some embodiments the samples that are averaged may 
10 also include samples taken from outside the sub-block. 

In the preferred embodiment, the method is performed by a coding 
engine using an algorithm which passes through the block multiple times for 
eveiy bit-plane in the magnitude of the samples, starting with the most 
significant bit and working down to the least significant bit, the truncation 
15 points being identified with the completion of each coding pass. In the 

preferred embodiment, for each code-block, , the size of the bit-stream, R" , 

at each truncation point, n , and the change in visual distortion, AD" , 
between truncation points /? - 1 and n are determined and this information is 

supplied to a convex hull analysis system, which determines the set of 
truncation points, = {/ipW^,...} , which are candidates for the rate-distortion 

optimisation algorithm, as well as respective monotonically decreasing rate- 
distortion slopes S"^ . Preferably also, summary information, A^. , R" and 
S", is stored along with the embedded bit streams for each code block, the 

storing process taking place until sufficient information has been stored to 
enable truncation points to be determined for each code-block. 

In the preferred embodiment, this information is saved until all code- 
blocks in the image have been compressed; however, memory constrained 
applications might choose to begin making truncation decisions before this 
point, subject to the available working memory. 

In the preferred embodiment, the rate-distortion optimal set of 
truncation points /?/ are determined for each code with a plurality of layers, 

each layer targeted to a distinct bit-rate or distortion level, with each layer 
targeting successively higher image quality such that for each successive 
layer / there are ^/j/"* truncation points, and the final scalable image bit- 
stream is formed by including R^' -R^' ' samples from code-block , into 
layer / , along with respective auxiliary information to identify the number of 
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samples whid^pbve been included for each block ancMR relevant truncation 
points. Preferably also, the coding engine uses an algorithm which passes 
through the code block multiple times for every bit-plane in the magnitude of 
the samples, starting with the most significant bit and working down to the 

5 least significant bit; the truncation points being identified with the 

completion of each coding pass. For each code-block, , the size of the bit- 
stream, T^" , at each truncation point, ?i , and the change in visual distortion, 
AD", between truncation points n - 1 and n are determined and this 

information is supplied to the convex hull analysis system to determine the 
10 set of truncation points, = {^i, Wj, - } » which are candidates for the rate- 
distortion optimisation algorithm, as well as the monotonically decreasing 
rate-distortion slopes, . The coding engine preferably uses the EBCOT 

algorithm as herein before defined. 

In the preferred embodiment, all of the code blocks have roughly the 

15 same rectangular size, regardless of the frequency band to which they belong, 
and this size is approximately in the range 32x32 to 64x64, where the smaller 
end of this size range is generally preferred. Also, in the preferred 
embodiment, the block partitioning operation is implemented incrementally, 
generating new code blocks and sending them to the block coding system as 

20 the relevant frequency band samples become available. 

In certain embodiments of the invention the method may be applied to 
colour image compression in an opponent colour representation, in which 
case distortion from the chixjminance channels is scaled differently to the 
distortion from the luminance channels prior to the application of the rate- 

25 distortion optimisation procedure. In such embodiments, the distortion 
measure is preferably modified to account for masking of chrominance 
artefacts by activity in the luminance channel. Preferably also, the distortion 
measure is modified to account for cross-channel masking between 
chrominance channels in these embodiments. 

30 According to a second aspect the present invention consists in a 

method of decompressing a digital image from a compressed bit stream 
created by the method set out above, the decompression method including 
the steps of: 

a) Unpacking the la)^ered compressed bit-sti'eam to recover the truncated 
35 embedded bit-streams corresponding to each code-block. 
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am 
and assembling the code-blocks into wet of frequency 

bands. 

c) Synthesising a reconstructed image from the frequency bands through 
the inverse transform. 
5 Preferably in the decoding method, the blocks are decoded on demand, 

as the relevant frequency band samples are requested by the inverse 
transform. Preferably also, the synthesis operation proceeds incrementally, 
requesting frequency samples and using them to synthesise new image 
samples, as those image samples are requested by the application. 
10 In various embodiments of the invention, the transform may be a 

Wavelet transform, a Wavelet packet transform, a Discrete Cosine Transform, 
or any number of other space-frequency transforms which will be familiar to 
those skilled in the art. In the preferred embodiment of the invention, a 
Wavelet transform is used, having the well-known Mallat decomposition 
15 sti'ucture. Also, in the preferred embodiment, the ti^ansform is implemented 
incrementally, producing new frequency band samples whenever new image 
samples become available, so as to minimise the amount of image or 
frequency band samples which must be buffered in working memory. Either 
or both systems may be physically implemented either in hardware or as a 
20 general purpose processor executing software instructions. 
Brief description of the Drawing 

Embodiments of the invention will now be described with reference to 
the accompanying figure, which is a block diagram of an image compression 
and decompression system, 
25 Detailed Description of the Invention 

Referring to Figm^e 1, the specific visual distortion measure which is to 
be used in the context of 0, will now be described along with approximations 
and techniques for efficient implementation of a visual distortion measure 
which is able to effectively exploit visual masking phenomena when used in 
30 conjunction with the rate-distortion optimisation methods outlined in the 
Background discussion above. 

In Figure 1, flow charts are illustrated for Compression and 
Decompression steps when making use of an embodiment of the present 
invention. In summary, the comparison process includes: 
35 a) Taking the original image 10 and performing a space frequency 

transform 11 such as a wavelet decomposition to produce a set of 
frequency bands 12; 
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b) perllllll^ing a partitioning 13 of the frequenc^Rnds 12 into code 
blocks 14; 

c) performing a process 15 on the code blocks 14 using an embedded 
block coding engine 16 to produce an embedded bit stream 28 and 

5 performing computation 17 on changes in visual distortion metric 

and performing a convex hull analysis 18 to produce block 
summary information 29; 

d) the embedded bit-streams 28 and block summary information are 
then stores 19; and 

10 e) truncation points are computed and a final bit-stream composed 20 

to produce the compressed, layered image bitstream 21 as an output 
of the process. 
The decompression process includes: 

a) Performing a recovery process 22 on the compressed image 21 from 
15 the compression process to recover the embedded block bit-stream 

30 from the layered image bit stream; 

b) Storing 23 the embedded bit stream 30 for each code-block; 

c) Passing the embedded bit streams for each code-block through an 
embedded block decoding engine 24 to produce code blocks 31; 

20 d) Performing an assembly process 25 to assemble the code-blocks 

into frequency bands 32; and 

e) Performing an inverse transform 26 on the frequency bands 32 to 
produce the reconstructed image 27. 

The details of these processes will be expanded upon in the following 
25 detailed description. 

The innovative aspect of the present invention is the way in which the 
distortion changes, AD/*, are computed in Figure 1. Specifically, the 

innovative aspect involves the exploitation of visual masking properties of 
the HVS to improve compression performance. The invention is enabled by 
30 the following four key observations: 

1) Most of the benefit which can be achieved by exploiting HVS 

characteristics within the context of embedded block coding, is obtained 
by exploiting intra-band visual masking alone. Moreover, the visual 
masking phenomenon can be successfully modelled in a manner which is 
35 independent of viewing distance. This is of great practical importance, 
since the viewing distance can rarely be known during compression, 



wo 00/495711 r C 1 /AVWtWl 13 

• • • 

which is llP^oint at which the distortion measuj^B^derpinning the rate- 
distortion operation must be chosen. By contrast, previous attempts to 
exploit HVS properties in scalable image compression systems have 
focused on the CSF (Contrast Sensitivity Function), which is inherently 
5 dependent on assumptions concerning the angle subtended by each 
reconstructed image pixel at the observer's eye and hence on viewing 
distance. The experimental work leading to the present invention has 
shown that the benefits which arise from taking the CSF into account are 
small by comparison with the benefits which arise from exploiting the 

10 visual masking phenomenon and that a successful masking model need 
not be dependent upon assumptions concerning the viewing distance. 
2) The spatial extent of the masking phenomenon is comparable (in a very 
loose sense) to the size of the code-blocks which can be efficiently coded 
independently, at least in the most interesting case when the space- 

15 frequency transform is a Wavelet transform with the conventional Mallat 
decomposition structure. This is of the greatest importance because 
visual masking is a space varying phenomenon which depends strongly 
upon the local activity in the relevant frequency band, whereas the size of 
the blocks which can be efficiently independently coded places a limit on 

20 the opportunity to track these spatial variations by adjusting the 

truncation points for each block. The fact that visual masking operates at 
a significant distance, rather than affecting only immediate neighbours, 
means that it is a slowly varjdng function of space which can be 
effectively tracked within the constraints imposed by code blocks of say 

25 32 by 32 samples each. The physical extent of the masking phenomenon 
tends to vary in inverse proportion to the spatial frequency associated 
with the relevant band. The preferred embodiment of the invention 
involves a conventional multi-resolution Wavelet transform, so that the 
sampling density for each frequency band also varies in inverse 

30 proportion to the spatial frequency of the band, which means that the 

block size should be chosen to be approximately the same in each band. 
Experiments with one particular embedded block coding algorithm have 
shown that good block coding efficiency can be achieved by using code- 
blocks of 32 by 32 or more samples in every band. Block coding 

35 efficiency decreases rapidly as the block size decreases below 32 by 32, 
but only slowly as the block size increases beyond this. Moreover, as the 
block size increases, implementation memoiy requirements grow rapidly 
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and the opportunities to track changes in the masking strength decrease, 
so a block size on the order of 32 by 32 is recommended for the preferred 
embodiment of the invention. 

3) After some minor approximations, it is possible to implement the masking 
5 strength computation very efficiently within the context of embedded 

block coding, so that the incorporation of these computations into the 
compression system represents a negligible increase in computation and 
no increase whatsoever in implementation memoiy requirements. A 
discussion of implementation issues may be found in the Detailed 
10 Description of the Embodiments which follows. 

4) By exploiting visual masking in a computationally and memory efficient 
manner within the context of embedded block coding, very substantial 
improvements in visual image quality can be achieved. Equivalently, for 
the same perceived image quality, the required bit-rate can be 

15 substantially reduced by exploiting visual masking. On some images, 

reductions of a factor of 2 in the bit-rate have been observed. Moreover, 
these visual benefits apply across a wide range of useful bit-rates and 
across a range of different resolutions. This is particularly important, 
since the image compression system under consideration generates 
20 scalable bit-streams. Application of the invention has so far been 

observed only to increase (i.e. never to decrease) overall visual quality, in 
a range of different images and bit-rates. It should be noted, however, 
that significant visual gains are observed primarily with large images (say 
IK by IK pixels or more) in which substantial spatial variation in the 
25 visual masking effect can be expected. 

The advantage of the invention is that perceived image quality may be 
very substantially improved without significantly affecting implementation 
memoiy or computational complexity, or sacrificing any of the other 
desirable features of embedded block coding systems. Equivalently. the bit- 
30 rate required to achieve a given visual image quality may be substantially 
reduced. Reductions in the required bit-rate of as much as 2:1 have been 
observed in some images, when compared against the conventional 
optimisation with respect to MSE, as outlined in the Background discussion 
above. The visual quality improvements also apply to all of the following: 
35 1) lower resolution images which might be reconstructed from the same bit- 
stream, after discarding the contributions from higher frequency sub- 
bands; 
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2) images reconstructed at a reduced bit-rate after discarding one or more of 
the trailing bit-stream layers; and 

3) smaller image regions reconstructed after discarding the contribution to 
the bit-stream of those code-blocks which do not affect the spatial region 

5 of interest. 

Visual Distortion Metric 

Following the notation established in the Background discussion 
above, let s^k] denote the sequence of samples in code-block B^, let s"[k] 

denote the representation of these samples which would be reconstructed if 
10 the block's embedded bit-stream were truncated to size and let [[w^j 

denote the L2-norm of any of the transform basis functions associated with 
samples in block B- . Then the MSE distortion measure is given by equation 

3. The visual distortion measure which has been found to substantially 
improve visual image quality is 

where the "visual masking strength", VXk] , at sample s.[fc] , has the form 

Here, 0^[k] denotes a neighbourhood of samples about s^k], while 
denotes the number of samples in this neighbourhood, a,, is a "visibility 

20 floor" which models a soft threshold and/or additive noise in the analogous 

masking strength "computation" within the Human Visual System (HVS). 
Although the subscript suggests that cr,. might vary on a block-by-block basis, 
the blocks have no physical analogy in the HVS, so we expect that a,, should 
depend at most upon the frequency band. It txnrns out in practice that 
25 may be approximated by a constant visibility floor, i.e. a,. = cr, V/ , and that 
visual quality is not a sti'ong function of a , provided it is small. In the 

prefen-ed embodiment of the invention, a typical value for cr is 10"* . In the 
preferred embodiment of the invention, the neighbourhood, , should be 

reasonably extensive and also independent of the code-block and hence the 
30 frequency band which is under consideration. The exact nature of this 

neighbourhood, however, is described separately below under the heading of 
Efficient Implementation. 
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The absolute interpretation of the sample values, s.[k'] , which appear 

equation 6, is important in view of the non-linearity introduced by the 
exponent, p. In the context of this discussion, the sample values are to be 



frequency bands a nominal dynamic range of 1. Thus, in the context of a 
conventional Wavelet transform (the preferred embodiment), the context 
within which equation 6 is to be interpreted is one in which the image 
samples are first normalised to a unit nominal range (i.e. the original image 
samples are all divided by 2^ , where is the original sample bit-depth; in 
the case of 8-bit images, /? = 8 .), and the analysis low-pass filters are all 
normalised to have a unit DC gain, while the analysis high-pass filters are all 
normalised to have a xmit gain at the Nyquist frequency. Bearing in mind 
that the frequency band samples are generally synmietric about 0 (we will 
consider the exceptional case of the lowest frequency DC band later), this 

means that the average in equation06 is not expected to exceed and will 
generally be substantially smaller. Under these conditions, and with and 

a>J/f] both independent of the frequency band under consideration, the 

formulation in equation 6 can be shown to be independent of any 
assumptions concerning viewing distance, which is highly desirable in most 
practical applications. 

An obvious generalisation of the neighbourhood averaging in equation 
6 would be to form a weighted average, with samples close to s^k] weighted 

more heavily than those further away, with the possibiliiy of incorporating 
directional sensitivity into the weights. Generalisations of this type, 
however, would introduce substantial increases in implementation 
complexity and so they will not be explicitly considered in the present 
discussion. 

Experience shows that the exponent, p, which appears in equation 6, 
should be set to about 0.5. The visual masking model embodied by equation 
6 is veiy closely related to those used in the APIC system and in Watson's 
work, where p = 0.6 and p = 0.7 in those cases, respectively. In the preferred 
embodiment of the invention, however, p = 0.5 is used. This value has 
significant computational advantages, since V^fc] appears in equation 6 
thi'ough its square. Perhaps even more importantly, the selection of /? = 0.5 

has been found to yield superior visual image quality, when considered over 
a range of different images and bit-rates. Larger values of p appear to be 



interpreted relative to a normalisation policy which assigns all samples in all 
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overly aggressive, particularly at lower bit-rates when distortion is most 
visible. 

All space-frequency transforms generally involve a lowest frequency 
band, which represents the baseband of the original image spectrum. This 
5 frequency band is anomalous in the sense that the average of sample absolute 
values over a neighbourhood is not a good measure of activity within the 
neighbourhood, as it is in the other bands. In this special case, equation 6 
should be replaced by 

10 where the samples, s'^[k] , are obtained by high-pass filtering the original 
sample values, s.[k] , for the band. This is closely related to the visual 

masking operator that one would obtain by decomposing the lowest 
frequency band using another space-frequency decomposition and averaging 
the masking strengths from the bands of this hypothetical decomposition. A 
15 suitable high-pass filter might have the following impulse response: 




Efficient Implementation 

20 Intuitively one would expect that the neighbourhoods, <I>.[A] , should 

contain all samples from the same frequency band as block 5, , which lie 
within a given distance from the sample, s^k] . In practice, however, this 
means that V^k] would have to be computed and its reciprocal taken in 

equation 5, for each separate sample, which would clearly increase the 

25 compression system's computation requirements substantially. Division and 

reciprocal operators are complex to implement and best to avoid or minimise 

wherever possible, particulaiiy in hardware implementations of the system. 
In the preferred embodiment of the invention, each code-block, , is 

partitioned into a collection of sub-blocks, Bf and the masking 

30 neighbourhood, <!>,.[/:] , is set equal to the sub-block. B/^^^ , to which sample 

s.[k] belongs. In this case, the masking strength, V-[k] , is identical for all 
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samples in sub-block B/^"^ . Let V/^''^ denote this constant value. Then 



equation 5 becomes 



Thus, the only increase in complexity over the simple case of MSE is due to 
the fact that the value of i)^/V/J must be computed for each sub-block and 

multiplied by the MSE computed for that sub-block. The exponentiation of 
each sample's magnitude by p , in equation 6, may be implemented with the 

aid of a small lookup table in some embodiments, since most of the relevant 
information is captured by the position of the most significant bit and a few 
additional less significant bits in the binary representation of the sample 
magnitudes. This implementation strategy is rendered particularly 
economical by the fact that typical implementations of the embedded block 
coding engine inherently discover the index of the most significant bit in the 
binary representation of each sample's magnitude. 

In some embodiments, the complexity of the computation of V/ may 

be simplified, at the expense of some accuracy in modelling the HVS, by 
moving the exponentiation by p outside the summation, to obtain 



In the extreme case when the code-block contains only a single sub-block 
and the above non-ideal approximation is made, the complexity may be even 
further reduced in some particular embodiments. In particular, when tlie 
preferred value of p = 0.5 is adopted, as explained above, the distortion for 
block B. corresponding to tmncation point n , is given by 

D" = 



iKf 



5, 



The division by the number of elements in the block, i.e. , is trivial if the 

block size is a power of 2, which is certainly the case in the preferred 
embodiment of the invention. Moreover, the remaining division by the value 
of {Vif maybe folded into the computation of the rate-distortion slopes, 

which is described in the Background discussion above. In this case, then, 



• 
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the complexity of the visual distortion metric is essentially identical to an 
MSE computation. In the preferred embodiment of the invention, however, 
smaller sub-blocks of size 8x8 have been found to yield the best results and 
the approximation of equation 6 by equation 9 is preferably avoided. 
5 A close examination of the rate-distortion optimisation steps outlined 

in the Background discussion, should confirm that the distortion, Z?" , is not 
used directly; only differences of the form D"' -D"' are of any significance, 
where «, and are two different truncation points. In the preferred 
embodiment of the invention, the difference between the distortion values 
10 for each pair of successive truncation points is computed directly from 

. (11) M):=Dr-D:=iw,fji(iy]'^^^^^^ 

Although seemingly complex, this computation requires remarkably little 
computational effort for two reasons. Firstly, in the fine embeddings which 
are relevant to the invention, each new tinincation point represents changes 

15 in only a fraction of the samples in the code-block, so that the MSE reduction 
need be computed only for those samples. Secondly, in the specific case of 
the EBCOT embedded block coding engine and other related coders, the MSE 
reduction for those samples which are affected by a particular coding pass 
may be well approximated with the aid of a very small lookup table and 

20 simple integer arithmetic, as carefully explained in the EBCOT document 
Generalisations 

Although it is clearly preferable to avoid any dependence on 
assumptions concerning the viewing distance, there are applications where 
such assumptions can be made with some degree of confidence. For this 

25 reason, the possibility of augmenting the visual masking formulation 
expressed above to account for variations in the Contrast Sensitivity 
Function (CSF) between frequency bands, is by no means excluded. This 
amounts to simply scaling all the distortion estimates, D", for code-blocks in 

a given frequency band, by a constant factor, where the means for 
30 determining these factors is described elsewhere in the literature and is not 
the subject of this present document. 

In the same way, the HVS model used to determine Z)" may be 

augmented by the inclusion of inter-band masking effects, rather than just 
intra-band masking, and also by the inclusion of local luminance adaptation 
35 effects. These phenomena have all been considered in a different context 
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within the APIC system and by Watson. In practice, luminance adaptation 
effects are partially compensated by the ganmia fimction used in the 
representation of most images to which the image compression system 
described here is expected to be applied. Also, inter-band masking tends to 
be a much weaker phenomenon than intra-band masking. As a result, it is 
not clear whether the additional computational and memory costs associated 
with attempts to exploit these phenomena are justified in practical 
applications. Nevertheless, the possibility that they would be used to 
enhance the performance of the intra-band masking formulation described in 
this document is by no means excluded. 

In some applications, it might be desirable to replace the squared error 
computation in equation 8 by an absolute error computation of the form: 



Modifications of this form might be considered for the sake of computational 
complexity, with relatively little affect on visual image distortion. 

When colour images are to be compressed, the visual masking 
formulation described above may be applied to all three colour channels and 
the rate-distortion optimisation methods discussed in the Background 
description above may then be applied jointly to all code-blocks representing 
the image. Better results may be obtained, however, if the distortions (or 
equivalently, the rate-distortion slopes) associated with the code blocks from 
each colour channel are first scaled by an amount which reflects the visual 
importance of that uhamiel. For example, an opponent colour space such as 
YUV, YIQ or Lab is commonly used for colour image compression and the 
luminance channel in any of these representations generally has greater 
visual significance than the chrominance channels. Inter-channel masking is 
probably not insubstantial so that the visibility of chrominance distortion is 
affected by activity in the same frequency band of the luminance channel. 
With some small increase in computational and memory resources, these 
effects may be accommodated within the visual distoition estimates, Z)". 

Specifically, for chrominance component code-blocks, equation 8 may be 
replaced by 
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where ^jj^.^ denotes the luminance sub-block whose location and frequency 
band correspond to the chrominance sub-block, Bf . The parameter a is 

then determined to maximise perceived visual quality over a wide range of 
colour images and bit-rates. Extensions to include cross-channel masking 
5 between the chrominance channels are straightforward, but with diminishing 
return. 

It will be appreciated by persons skilled in the art that numerous 
variations and/or modifications may be made to the invention as shown in 
the specific embodiments without departing from the spirit or scope of the 
10 invention as broadly described. The present embodiments are, therefore, to 
be considered in all respects as illustrative and not restrictive. 
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CLAIMS 

1. A method of compressing a digital image including the steps of: 

a) Decomposing the image into a set of distinct frequency bands using a 
space frequency transform; 
5 b) Partitioning the samples in each frequency band into code blocks; 

c) For each code-block, generating an embedded bit-stream to represent 
the contents of the respective code block; 

d) Determining a rate-distortion optimal set of truncation points, for 
each code-block, 5,., and each quality layer, / , subject to a constraint 

10 on the overall bit-rate or distortion for the layer in a manner which is 

sensitive to tlie masking property of the Human Visual System (HVS); 
and 

e) Storing the embedded bit-streams for each code-block. 

2. The method as claimed in claim 1, wherein the code block ti'uncation 
15 points are selected according to a rate-distortion optimisation criterion, using 

a distortion measure which is sensitive to masking in the HVS. 

3. The method as claimed in claiml or 2, wherein contributions to the 
distortion measure from each sample in a code block are weighted as a 
function of a neighbourhood of samples surrounding the respective sample. 

20 4. The method as claimed in claim 3, wherein the weighting function is a 
function of the magnitudes of the samples in the respective neighbourhood of 
samples. 

5. The method as claimed in claim 3 or 4, wherein the weighting function 
is held constant over a sub-block of samples. 
25 6. The method as claimed in claim 5, wherein the sub-block of samples 
has dimensions no larger than the full size of the respective code block. 
7. The method as claimed in claim 5 or 6, wherein the weighting function 
is a function of the magnitude of samples taken only from within the sub- 
block of samples. 

30 8. The method as claimed in any one of claims 1 to 7, wherein the 

distortion error is a weighted sum of the squared errors taken at each sample. 
9. The method as claimed in any one of claims 1 to 8, wherein the 
method is performed by a coding engine using an algorithm which passes 
through the block multiple times for every bit-plane in the magnitude of the 

35 samples, starting with the most significant bit and working down to the least 
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significant bit, the truncation points being identified with the completion of 
each coding pass, 

10. The method as claimed in claim 9, wherein, for each code-block, 5. , 
the size of the bit-stream, R^" , at each truncation point, n , and the change in 
5 visual distortion, AD" , between truncation points n-l and n are determined 

and this information is supplied to a convex hull analysis system, which 
determines the set of truncation points, A^,. = {«i,/?2,...} , which are candidates 

for the rate-distortion optimisation algorithm, as well as respective 
monotonically decreasing rate-distortion slopes S"^ . 

10 11. The method as claimed in claim 10, wherein summary information, 
, Rl" and Sj" , is stored along with the embedded bit streams for each code 

block, the storing process taking place until sufficient information has been 
stored to enable truncation points to be determined for each code-block. 

12. The method as claimed in claim 11, wherein the summaiy information 
15 is saved until all code-blocks in the image have been compressed. 

13. The method as claimed in claim 12, wherein a truncation decision is 
made before all code-blocks in the image have been compressed, and the 
summary information is saved only for those code blocks for which 
truncation have not yet been made. 

20 14. The method as claimed in any one of claims 11 to 13, wherein the rate- 
distortion optimal set of truncation points nl are determined for each code 

block with a plurality of layers, each layer targeted to a distinct bit-rate or 
distortion level, with each layer targeting successively higher image quality 
such that for each layer / there are /// ^ truncation points, and the final 

25 scalable image bit-stream is formed by including I^' -I^' ' samples from 
code-block B. , into layer / , along with respective auxiliary information to 

identify the number of samples which have been mcluded for each block and 
the relevant truncation points. 

15 The method as claimed in any one of claims 10 to 14, wherein the 
30 coding engine uses an algorithm which passes through the code block 

multiple times for every bit-plane in the magnitude of the samples, starting 
with the most significant bit and working down to the least significant bit; 
the truncation points being identified with the completion of each coding 
pass. 

35 16. The method as claimed in claim 15, wherein, for each code-block, 5. , 
the size of the bit-stream, , at each truncation point, // , and the change in 
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visual distortion, AD" , between truncation points n-1 and n are determined 

and this information is supplied to the convex hull analysis system to 
determine the set of truncation points, = K,«2> » which are candidates 

for the rate-distortion optimisation algorithm, as well as the monotonically 
decreasing rate-distortion slopes, sy . 

17. The method as claimed in claim 15 or 16, wherein the coding engine 
uses the EBCOT algorithm as herein before defined . 

18. The method as claimed in any one of claims 1 to 17, wherein all of the 
code blocks have substantially the same size, independently of the frequency 
band to which they belong. 

19. The method as claimed in claim 18, wherein the size of the code 
blocks is substantially in the range 32x32 to 64x64. 

20. The method as claimed in any one of claims 1 to 19, wherein the block 
partitioning operation is implemented incrementally, generating new code 
blocks and sending them to the block coding system as the relevant 
frequency band samples become available. 

21. The method as claimed in any one of claims 1 to 20, wherein the 
method is applied to colour image compression in an opponent colour 
representation, and wherein distortion from the chrominance chaimels is 
scaled differently to the distortion from the luminance channels prior to the 
application of the rate-distortion optimisation procedure. 

22. The method as claimed in claim 21, wherein the distortion measure is 
modified to account for masking of chrominance artefacts by activity in the 
luminance channel. 

23. The method as claimed in claim 22, wherein the distortion measure is 
modified to account for cross-channel masking. 

24. The method as claimed in any one of claims 1 to 23, wherein the space 
frequency domain transform is one selected from a Wavelet transform, a 
Wavelet packet transform, a Discrete Cosine transform, or a Fourier 
ti'ansform. 

25. The method as claimed in any one of claims 1 to 23, wherein a Wavelet 
ti'ansform is used, having a Mallat decomposition structure. 

26. The method as claimed in any one of claims 1 to 25, wherein the 
transform is implemented incrementally, producing new frequency band 
samples whenever new image samples become available to minimise the 
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quantity of image or frequency band samples which must be buffered in 
working memory 

27. A method of decompressing a digital image from a compressed bit 
stream created by the method as claimed in any one of claims 1 to 26, the 

5 decompression method including the steps of: 

Unpacking the layered compressed bit-stream to recover the trimcated 
embedded bit-streams corresponding to each code-block. 
Decoding and assembling the code-blocks into a set of frequency bands. 
Synthesising a reconstructed image from the frequency bands through the 
10 inverse transform. 

28. The method as claimed in claim 27, wherein the blocks are decoded on 
demand, as the relevant frequency band samples are requested by the inverse 
transform. 

29. The method as claimed in claim 27 or 28, wherein the synthesis 

15 operation proceeds incrementally, requesting frequency samples and using 
them to synthesise new image samples, as those image samples are requested 
by the application. 
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